-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: Consolidate tests that raise in groupby #50749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: Consolidate tests that raise in groupby #50749
Conversation
…ure/consolidate_groupby_fails_tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This is a good step toward consolidating these. However there are many removed tests that are not part of the added tests (and conversely, there are many added tests that were not originally there! That bit is great). I've identified which I believe are missing below, please let me know if you think any are mistaken or would like explanation as to why.
I think if we can get coverage with the following cases, it's be good.
- Multiple groupings (
.groupby(['a', 'b'])
) - observed=False for categorical
- Series selection after groupby (
.groupby(...)['column']
) - Groupings that aren't in-axis (
.groupby([0, 0, 1])
) - Groupings that are groupers (
.groupby(pd.Grouper(...))
) - NumPy functions (
.groupby(...).agg(np.mean)
)
…ure/consolidate_groupby_fails_tests
Thanks for the review @rhshadrach. I updated the tests, I believe now it should cover all the cases you mentioned (and more). Wondering if 28k tests for a single file isn't to much though ? Anyway, I think we are still missing some cases like grouping on the index or grouping on a dtype other than int or category. Not sure if you think it is necessary here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow - great work here! For any added tests, can you add # GH#50749
to the top of any new tests.
What's the runtime of this file on your machine?
pandas/tests/groupby/test_raises.py
Outdated
|
||
|
||
@pytest.fixture(params=[True, False]) | ||
def groupby_serie(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: groupby_series
pandas/tests/groupby/test_raises.py
Outdated
|
||
if groupby_serie: | ||
if groupby_func == "corrwith": | ||
pytest.skip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why skip here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the corrwith method is not implemented for SeriesGroupBy. We could test for an AttributeError, but to me it makes more sense to skip it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should assert not hasattr(gb, 'corrwith')
and return. By skipping, if corrwith
is ever added then this test will always not fail. On the other hand, if you add the assert, then if it is added this test will fail and the author will know to update it.
On my machine, I get : Also, do you know why the CI pipeline is failing ? I see some failed tests in pandas/tests, but I don't see the connection with this PR ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I am concerned though with the runtime for this. Can you remove sort
and as_index
from the tests - I think it's safe to test without these and would reduce the runtime by ~3/4ths.
Also, do you know why the CI pipeline is failing ? I see some failed tests in pandas/tests, but I don't see the connection with this PR ?
This appears unrelated to me too. This is okay to ignore.
pandas/tests/groupby/test_raises.py
Outdated
|
||
if groupby_serie: | ||
if groupby_func == "corrwith": | ||
pytest.skip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should assert not hasattr(gb, 'corrwith')
and return. By skipping, if corrwith
is ever added then this test will always not fail. On the other hand, if you add the assert, then if it is added this test will fail and the author will know to update it.
…ure/consolidate_groupby_fails_tests
Done. The new runtime on my machine is : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @albuzenet - great work! |
Thanks for the review @rhshadrach ! |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Part of #50133. Continue the work of @rhshadrach started in #50404 for the groupby consolidation.